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Assessing relevant molecular differences between human-induced pluripotent stem cells (hiPSCs) 
and human embryonic stem cells (hESCs) is important, given that such differences may impact their 
potential therapeutic use. Controversy surrounds recent gene expression studies comparing hiPSCs 
and hESCs. Here, we present an in-depth quantitative mass spectrometry-based analysis of hESCs, 
two different hiPSCs and their precursor fibroblast cell lines. Our comparisons confirmed the high 
similarity of hESCs and hiPSCS at the proteome level as 97.8% of the proteins were found 
unchanged. Nevertheless, a small group of 58 proteins, mainly related to metabolism, antigen 
processing and cell adhesion, was found significantly differentially expressed between hiPSCs and 
hESCs. A comparison of the regulated proteins with previously published transcriptomic studies 
showed a low overlap, highlighting the emerging notion that differences between both pluripotent 
cell lines rather reflect experimental conditions than a recurrent molecular signature. 
Molecular Systems Biology 7: 550; published online 22 November 2011; doi:10.1038/msb.2011.84 
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Keywords: human embryonic stem cells; human-induced pluripotent stem cells; proteomics; quantitation 



Introduction 

Human embryonic stem cells (hESCs) are capable of self- 
renewal and multi-lineage differentiation (i.e., pluripotency; 
Thomson et al, 1998). Owing to these two unique properties, 
they are considered as one of the most promising sources for 
tissue replacement therapies. However, the use of hESCs 
entails numerous ethical issues as they are derived from 
human embryos. Recently, reprogramming of somatic cells to 
an embryonic stem cell-like state, named induced pluripotent 
stem cells (iPSCs), was achieved through retroviral transfec- 
tion of a defined set of transcription factors (Takahashi et al, 
2007; Yu et al, 2007; Park et al, 2008b). To date, multiple 
somatic cells from diverse adult tissues (i.e., endoderm, 
mesoderm and ectoderm origins) have been successfully 
reprogrammed to iPSCs, including fibroblasts (Takahashi 
et al, 2007; Yu et al, 2007; Park et al, 2008b), blood (Loh 
et al, 2009), neural progenitors (Eminli et al, 2008) and fully 
differentiated lymphocytes (Hanna et al, 2008). Furthermore, 
multiple strategies have been proposed as alternatives to 
potentially harmful retroviruses, including drug-inducible 
systems (Hockemeyer et al, 2008), virus-free transposon 
mediated (Woltjen et al, 2009), recombinant proteins (Kim 
et al, 2009) and miRNAs (Miyoshi et al, 2011) . Finally, human- 
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induced pluripotent stem cells (hiPSCs) represent a unique 
tool to develop cellular models for many human diseases 
(Park et al, 2008a; Soldner et al, 2009). 

Although current functional assays such as in-vitro differ- 
entiation, teratoma formation, chimera formation germline 
contribution and tetraploid complementation (Jaenisch and 
Young, 2008) have confirmed the pluripotency of hiPSCs 
(Takahashi et al, 2007; Yu et al, 2007; Park et al, 2008b), there 
might still be significant differences when compared with their 
natural hESC counterparts. For instance, hiPSCs have been 
shown to differentiate in a less efficient manner than hESCs 
(Feng et al, 2010; Hu et al, 2010). Consequently, an extensive 
molecular characterization to address differences and simila- 
rities between these two pluripotent cell lines seems to be a 
prerequisite before any clinical application is conducted. 
Despite that great efforts have been made to address how 
similar hESCs and hiPSCs are, the definite answer to this 
fundamental question is still the subject of active debate 
(Guenther et al, 2010; Newman and Cooper, 2010; Chin et al, 
2009, 2010b). Using microarray-based approaches, several 
studies have reported residual levels of transcriptional 
memory of the parental somatic cell line in the reprogrammed 
hiPSCs (Chin et al, 2009; Marchetto et al, 2009; Ghosh et al, 
2010; Ohi et al, 2011). However, it has also been shown that 
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these gene expression profiles could represent lab-specific 
signatures due to in-vitro micro environmental conditions 
rather than a recurrent molecular signature across different 
hiPS cell lines (Guenther et al, 2010; Newman and Cooper, 
2010). In addition, epigenetic analyses have documented 
significant differences in the DNA methylation patterns 
between hiPSCs and hESCs (Deng et al, 2009; Doi et al, 
2009; Lister et al, 2009; Kim et al, 2010; Polo et al, 2010; Bock 
et al, 2011) . In fact, the transcriptional memory of hiPSCs could 
be partially explained by the incomplete DNA methylation at 
the promotor regions of somatic genes (Ohi et al, 2011). Non- 
coding miRNAs have an important role in the underlying 
mechanisms of reprogramming (Samavarchi-Tehrani et al, 
2010; Subramanyam et al, 2011) and they can replace the 
ectopic expression of transcription factors to generate iPSCs 
with even higher efficiency (Anokye-Danso et al, 2011). Thus, 
miRNA profiles between hESCs and hiPSCs were compared 
and a signature in the expression of the miR-3 71/3 72/3 73 
cluster was found (Wilson et al, 2009). Finally, genetic 
integrity was also studied and it was found that the 
reprogramming process could induce several genomic 
abnormalities (Mayshar et al, 2010; Hussein et al, 2011; 
Laurent et al, 2011). 

Despite intensive efforts in molecular characterization, 
direct assessment of protein levels has yet to be incorporated 
into these integrative systems-level analyses. Protein levels are 
tuned by intricate mechanisms of gene expression regulation 
and it has recently been documented that mRNA and protein 
levels poorly correlate in mouse ESCs (Lu et al, 2009). 
Proteomics is, however, more labor-intensive and often lacks 
the profiling depth that can be obtained at the transcript level. 
Mass spectrometry (MS) -based proteomics is, currently, the 
most powerful tool to globally profile proteomes and has also 
been used to study different aspects of the stem cell biology 
(Swaney et al, 2009; Van Hoof et al, 2009; Rigbolt et al, 2011). 
Here, we use in-depth quantitative proteomics to gain insights 
into the differences and similarities in the protein content of 
two hiPS cell lines (IMR90 and 4Skin), their precursor 
fibroblast cell lines and one hES (HES-3) cell line, all grown 
and maintained under the same experimental conditions, 
providing novel molecular signatures that may assist in filling 
a gap in our understanding of pluripotency. 

Results 

Confirmation of pluripotency and experimental 
design 

To study the degree of similarity, at the protein level, between 
hiPSCs and hESCs, two MS-based proteomic experiments 
using two different hiPS cell lines were conducted (Figure 1). 
In Experiment 1, IMR90_iPS were compared to hESCs (HES-3) 
and to the parental cell line, IMR90_Fibro. In Experiment 2, 
4Skin_iPS, hESCs (HES-3) and the somatic cells, 4Skin_Fibro, 
were analyzed. Both hiPS cell lines were derived through the 
reprogramming of IMR90 fetal fibroblasts and foreskin 
fibroblasts, by ectopic expression using retroviruses carrying 
SOX2, OCT4, NANOG and LIN28 transgenes (Yu et al, 2007). 
Upon extended culture, hiPSCs adopt a gene expression profile 
which more closely resembles that of the hESCs (Chin et al, 
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2009) . For this study, the two hiPS cell lines were analyzed at 
late passage. However, long-term culture conditions might 
induce genomic instability (Baker et al, 2007), which might 
compromise the pluripotency of these cell lines. Therefore, we 
confirmed the pluripotency of both hiPS cell lines by checking 
the expression of known hESCs markers (e.g., OCT4, 
podocalyxin and tra-1-60), karyotypic stability and in-vivo 
differentiation capabilities (Supplementary Figure SI). 
Characterization of hESCs (HES-3 cell line) was described 
elsewhere (Chin et al, 2010a) . 

All the six samples were subjected for proteomic analysis 
(Figure 1). Basically, proteins were extracted in a buffer 
containing 8 M urea and subsequently cleaved into peptides 
using a double digestion with Lys-C and trypsin (Figure 1). 
Metabolic labeling presents some caveats in hESCs (Van Hoof 
et al, 2007) and, so far, has not been applied to hiPSCs; 
whereas label-free approaches are less suitable for large 
multidimensional separation-based strategies. Therefore, we 
applied our in-house developed peptide labeling that uses 
solid-phase extraction and triplex dimethyl labeling chemistry 
(Boersema et al, 2009). Two biological replicas were con- 
ducted for each experiment, where labels were swapped 
between the hESCs and hiPSCs (parental fibroblasts were kept 
constant; Figure 1). In order to ensure maximal protein 
identification, we reduced sample complexity by a strong 
cation exchange (SCX) chromatography. Subsequently, 
Experiment 1 was analyzed by high-resolution LC-MS/MS 
with electron transfer dissociation (ETD) as well as collision- 
induced dissociation (CID) for peptide sequencing. Experi- 
ment 2 was analyzed with a data-dependent decision tree 
using higher-energy collision dissociation (HCD) and ETD 
with either Orbitrap or linear ion trap readout (Frese et al, 
2011). MS intensities of the 'light', 'intermediate' and 'heavy' 
peaks accurately reflect the relative abundance of peptides in 
the three cell types (Figure 1). 



In-depth quantitative proteomic analysis of hESCs, 
hiPSCs and fibroblasts 

An overview of the proteomic results is presented in 
Supplementary Table SI. Briefly, a total of 348 LC-MS/MS 
analyses (including technical and biological replicates) were 
performed leading to 4 551920 MS/MS sequencing events 
(cumulative value of CID, HCD and ETD spectra). We 
confidently identified 1 593 446 peptide spectrum matches at 
a peptide false discovery rate (FDR) below 1% (Mascot Ion 
Score > 20). In Experiment 1 (IMR90), a total of 6873 unique 
protein groups were identified (3994 in common between both 
biological experiments; Supplementary Table S2) . On the other 
hand, Experiment 2 (4Skin) consists of 8548 unique protein 
groups (5516 identified in both biological replicas; Supple- 
mentary Table S3). Combining all the data sets, we identified 
10 628 unique protein groups (3001 proteins were identified at 
the intersection of all four data sets). Most importantly, the 
vast majority of the proteins in our data set (80-90%) were 
identified on the basis of at least two unique peptides with an 
average of 9 ± 15 peptides per protein (Supplementary Figures 
S2 and S3). To the best of our knowledge, the coverage 
obtained in this study represents the largest achieved by any 
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Figure 1 Experimental workflow and overview of the proteomic experiments performed. To characterize the proteomes of human-induced pluripotent stem cells 
(hiPSCs) and human embryonic stem cells (hESCs), two MS-based experiments, using two independent hiPS cell lines, were conducted. Experiment 1 (top-left panel) 
focused on hiPS_IMR90, hESCs and IMR90 fetal fibroblasts (cell line used for reprogramming). Experiment 2 (top-right panel) focused on hiPS_4Skin, hESCs and 
parental 4Skin fetal fibroblasts. Proteins were extracted and digested with Lys-C and trypsin. Peptides were labeled using triplex dimethyl chemistry, equally mixed and 
prefractionated by using strong cation exchange (SCX). Two biological replicas were performed for each experiment, where labels were swapped between hiPSCs and 
hESCs. SCX fractions were analyzed by high-resolution LC-MS/MS. Experiment 1 was analyzed with an LTQ Orbitrap XL using both CID and ETD fragmentations, 
whereas Experiment 2 was analyzed with an LTQ Orbitrap Velos using a data-dependent decision tree (DDDT) using HCD and ETD. The peak intensities of the identified 
peptides reflect their relative abundance in the samples (bottom panel). 



proteomics screen on pluripotent cells. Typically, proteomic 
studies are biased toward the detection of highly expressed 
genes; nevertheless, our data set includes numerous proteins 
known to be of low abundance in mammalian cells. We 
classified, by protein class, all the 10 628 identified proteins 
(7631 contained official gene symbols with functional annota- 
tion) and found 649 transcription factors, 247 kinases (48% of 
the putative human kinome (Manning et al, 2002)), and 
proteins that are difficult to detect by MS, such as membrane 
proteins (1494 proteins were predicted by TMHMM to contain 
transmembrane helices). Interestingly, we also confirmed the 
existence, at the protein level, of genes where only transcript 
evidence was available (1876 proteins were annotated as 
'hypothetical' or 'putative uncharacterized') . Furthermore, we 
compared our data set with two of the largest proteomic 

© 2011 EMBO and Macmillan Publishers Limited 



analyses carried out to date in hESCs (Van Hoof et al, 2009; 
Rigbolt et al, 2011) . Remarkably, we found a high overlap as we 
identified ~90% of the reported proteins by these studies 
(~2200 were unique to our current analysis). The transcrip- 
tional circuitry involved in pluripotency is controlled by a core 
of three transcription factors: SOX2 (Yuan et al, 1995; Avilion 
etal 2003), NANOG (Mitsui etal 2003) and OCT4 (Niwa etal 
2000) . We confidently identified the protein product of these 
genes and several other well-known hESC markers, such as 
DNMT3B, UTF1, PODXL, GRB7 and BRIX (Adewumi et al, 
2007). Taken together, these results indicate the comprehen- 
siveness of our data (mammalian cells express 10 000-15 000 
transcripts (Jongeneel et al, 2003)) and thus it can serve as a 
reliable resource for those interested in the pluripotent stem 
cell proteome. 
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Overall, 5835 proteins were quantified in Experiment 1, 
3537 of which were found in common between the two 
biological replicas (Supplementary Table S2) . In the same way, 
quantitative measurements for 7154 proteins were obtained in 
Experiment 2, where 4718 proteins were measured in the two 
biological replicas (Supplementary Table S3). We further 
focused on the 2683 proteins confidently quantified in all our 
experiments and data sets. The analysis of variability in our 
technical (Supplementary Figure S4) and biological (Supple- 
mentary Figure S5) replicas demonstrated high quantification 
accuracy and reproducible proteomic measurements for both 
experiments with Pearson correlation factors between 0.84 
and 0.96. Remarkably, ~85% of our protein ratios showed 
<35% variability (Supplementary Figures S6D, S7D, S8D and 
S9D). Of note, we obtained accurate measurements for 
proteins changing in abundance more than 100-fold. Further- 
more, using the extracted ion chromatograms of the three most 
abundant peptides per protein (Grossmann et al, 2010), we 
estimated the absolute abundance of the identified proteins 
within the samples spanning six orders of magnitude 
(Supplementary Tables S2 and S3). 



all the 10 658 identified proteins in this study as the back- 
ground data set (see Materials and methods section). The 46 
proteins upregulated in hESCs were enriched (P<0.05, 
binomial test) in GO terms related to antigen processing 
(e.g., (3-2 microglobulin (B2M), TABP) and metabolism of 
amino acids (e.g., SDHB, ACOX1) and lipids (e.g., APOL2, 
SOAT1) among others (Supplementary Figure SUA). On the 
other hand, the 12 proteins that we found highly expressed in 
hiPSCs were mainly related to cell-adhesion and ectoderm and 
mesoderm development (e.g., VCAN, COL4A1, CDH2; 
P<0.05, binomial test; Supplementary Figure SUB). Taken 
together, our results indicate that the reprogramming process 
remodeled the proteome of both fibroblast cell lines to a profile 
that closely resembles the pluripotent hESCs proteome: 97.8% 
of the 'confidently' quantified proteins (i.e., 2683 proteins) 
showed nonsignificant changes. Nevertheless, a small fraction 
of their proteomes, 58 proteins (2.2%), was found significantly 
changing between hiPSCs and hESCs. Functional analyses on 
this subset of proteins revealed enrichment in certain 
biological processes, including cell communication and 
immune system. 



High similarity in the proteomes of hESCs 
and hiPSCs 

Besides the fact that both hiPSCs and hESCs are pluripotent, it 
is still not clear how similar both cell lines are at the proteome 
level. Thus, we compared the protein levels of hESCs and 
hiPSCs and found a very high degree of similarity. In Figure 2, 
the absolute protein abundance (log 10 scale) is plotted against 
the relative protein ratios (log 2 scale) for the hESC/IMR90_iPS 
(Figure 2 A) and hESC/4Skin_iPS (Figure 2B) comparisons. 
The vast majority of the proteins showed minor or no changes 
between hiPSCs and hESCs (as seen in the histograms of 
frequencies). As expected, pluripotency markers including 
SOX2, NANOG, OCT4, LIN28 and SALL4 were found in almost 
identical levels between hESCs and hiPSCs in both experi- 
ments. We then sought to define those proteins that 
differentially expressed between hESCs and hiPSCs. For this 
purpose, we used the significance analysis of microarrays 
(SAMs) test (Tusher et al, 2001): a commonly used statistical 
test in transcriptomic studies (see Materials and methods 
section). SAM has been recently shown to be applicable for 
quantitative proteomic data sets as well (Roxas and Li, 2008) 
and is particularly useful because it provides an estimation of 
FDRs for a defined set of significant changes. Only those 
proteins quantified in both experiments (i.e., IMR90 and 
4Skin) and in both biological replicas were subjected to 
statistical analysis using SAM (i.e., 2683 proteins). Figure 3A 
shows the log 2 ratios for the hESCs/hiPSCs comparisons of the 
2683 'confidently' quantified proteins represented as a 
heatmap plot. After SAM analysis, we found 58 proteins 
significantly regulated (FDR=1.27%, Supplementary Figure 
S10A) between the two hiPS cell lines and the hESCs: 46 
proteins hESCs > hiPSCs and 12 proteins hESCs < hiPSCs 
(Figure 3B and Supplementary Table S4). 

Next, we tested whether the proteins differentially 
expressed between hiPSCs and hESCs were functionally 
linked. To this end, we used GO enrichment analyses using 
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Profound differences in the proteomes of hiPSCs 
and their parental fibroblast cell lines 

The inclusion of the two parental fibroblast cell lines in our 
analysis allowed us to study changes in the proteome at both 
the starting and end points of the reprogramming process. 
As expected, this comparison revealed completely different 
proteomes: the vast majority of the proteins showed differ- 
ential expression between the parental fibroblasts and the 
reprogrammed pluripotent cells (Figure 2). We observed a 
remarkable number of highly abundant proteins displaying 
very extreme ratios in the fibroblasts (more than 100-fold in 
many cases). Some of these proteins corresponded to 
fibroblast markers, such as VIM, COL1A1, COL1A2 and 
THBS1, which were absent in the hiPSCs. On the other hand, 
we also confirmed the higher expression in the hiPSCs of 
known pluripotency markers such as SOX2, NANOG, OCT4, 
LIN28 and SALL4. Using the SAM statistical analysis, we found 
943 proteins significantly enriched in fibroblasts and 1029 
proteins with higher levels in the reprogrammed cells 
(FDR=1.1%, Supplementary Figure S10B; Figure 3C and D 
and Supplementary Table S4). When we looked at the GO 
terms associated with these proteins, we found that the 
fibroblasts were enriched in terms related to transport, 
endocytosis, exocytosis and metabolism (Supplementary 
Figure S11C). On the other hand, the proteins enriched in 
hiPSCs were enriched in numerous GO categories spanning 
different biological processes such as nucleic acid metabolism, 
chromatin organization and cell cycle (Supplementary Figure 
S11D). To further investigate this, we subjected all the hiPSCs 
and fibroblast-specific proteins to String analysis (Snel et at, 
2000), a bioinformatic tool that reconstructs protein networks 
based on different features like co-expression of genes, 
physical interactions and co-citation. Strikingly, we obtained 
hyper-connected protein networks for both sets of proteins 
(Supplementary Figure SI 2). The majority of the proteins 
showed multiple functional connections with other members, 
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Figure 2 Quantitative proteomic comparisons of hESCs, two hiPSCs and their precursor fibroblast cell lines. Protein abundances (Grossmann et al, 201 0) are plotted 
against protein ratios for the hESCs/IMR90_iPS (A), hESCs/4Skin_iPS (B), IMR90_Fibro/IMR90_iPS (C) and 4Skin_Fibro/4Skin_iPS (D) comparisons. The size of the 
spot reflects the number of unique peptides used to calculate the protein ratio. The color code reflects the variability (i.e., relative standard deviation) of the peptide ratios 
for each protein. On top, the histograms of frequencies show the density of proteins in each analysis using a bin size of 0.25 (log 2 ). Some of the proteins that are 
discussed in the text are shown in the plots. NA, not applicable. 
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Figure 3 Proteome differences between hESCs, hiPSCs and their precursor fibroblasts. 2683 proteins were quantified in the four data sets: IMR90 and 4Skin 
experiments (Biological replicas 1 and 2). Relative protein abundances are represented as heatmaps for hESCs/hiPSCs (A) and fibroblasts/hiPSCs (C) comparisons. 
Using significance analysis of microarrays (SAM; Tusher etal, 2001 ; Roxas and Li, 2008), 58 proteins (2.2%) were found significantly regulated between hESCs/hiPSCs 
(B) and 1927 (73.4%) between fibroblasts/hiPSCs (D). The figure was created using the Multi Experiment Viewer software (Saeed etal, 2006). Red and green colors 
indicate upregulated and downregulated events, respectively. Genes were further grouped using hierarchical clustering (distance metric was Cosine correlation and 
linkage method was average). 



and a protein cluster densely interconnected with thousands of 
links was clearly observed in both analyses. Therefore, we 
reasoned that the observed protein networks may constitute 
the protein backbone that controls pluripotent cells and fully 
differentiated fibroblast cells. 
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mRNA and protein correlation 

Protein levels are adjusted by an intricate mechanism of gene 
expression regulation. For instance, recently a poor correlation 
between protein and mRNA on differentiating mouse ESCs was 
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Figure 4 Transcript levels of the differentially expressed proteins between hiPSCs and hESCs. The mRNA levels of the identified proteins were measured on the same 
samples that were used for the proteomic analyses using microarrays (Affymetrix platform). Gene symbols were used to correlate the measurements from both 
approaches. The figure shows the transcript levels for all the significantly regulated proteins found for hESCs > hiPSCs (A), hESCs < hiPSCs (B), fibroblasts > hiPSCs (C) and 
fibroblasts < hiPSCs (D). Protein and mRNA ratios were calculated as the average of the four measurements obtained (IMR90 and 4Skin experiments with two biological 
replicas each). The corresponding error bars (standard deviations) are shown for each value (error bars in C and D were omitted for a better visualization of the figure). 



reported (Lu etal, 2009) . To find out if the differences observed 
in our study were a consequence of transcriptional or 
translational regulation, we performed paired genome-wide 
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gene expression analyses on the same six samples that were 
used for the proteomic profiling (Supplementary Methods). 
Overall, we observed a good correlation between mRNA and 
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protein levels (r~0.7). Most importantly, when we looked at 
the transcript levels of the regulated proteins, we found a 
remarkable agreement (Figure 4). In the hESC/hiPSC compar- 
ison, most of the differential proteins were accompanied by a 
change in the mRNA levels in the same direction (Figure 4A 
and B). The fibroblast/hiPSC comparison showed the same 
trend, where most of the genes regulated between these two 
cell lines were affected at the protein and mRNA levels (Figure 
4C and D) . These results further authenticated the proteomic 
measurements and implied a high degree of control at the 
transcriptional level. Nevertheless, numerous genes were 
found uncorrelated highlighting the necessity of complement- 
ing transcriptomic-based approaches with proteomics. 



Discussion 

Since the discovery in 2006 that somatic cells can be 
reprogrammed to an embryonic-like state (Takahashi and 
Yamanaka, 2006), a fundamental question remains unan- 
swered, i.e., are hiPSCs equivalent to hESCs, their natural 
counterparts? This is especially relevant as genetic defects may 
affect hiPSCs during differentiation and/or transplantation 
(Hanna et al, 2010). The conventional procedure to evaluate 
the pluripotency of iPSCs is based on biological assays for 
developmental potency. However, the tetraploid complemen- 
tation assay, which is considered the gold standard test for 
pluripotency, is restricted to murine cell lines. Accordingly, 
hiPSCs need to be examined extensively at the molecular level. 
This allows the characterization of hiPSCs on the basis of 
quantitative measurements, which, at the same time, will 
increase our knowledge on the underlying mechanisms of 
pluripotency and self-renewal. In the last few years, several 
studies have reported the analysis of DNA methylation status, 
histone modification patterns, coding mRNA and non-coding 
miRNA expression patterns in both hiPSCs and hESCs. The 
conclusions derived from such studies are still uncertain, and 
the presence of a recurrent molecular signature from the 
parental cell line as a consequence of incomplete reprogram- 
ming (i.e., epigenetic memory) is currently being debated. 
However, all the aforementioned levels of gene expression 
regulation function in an orchestrated manner to tune the 
actual molecular effectors of cells: proteins. Here, we have 
compared the proteomes of two different hiPS cell lines, their 
corresponding somatic cells and one hES cell line. 

Faced with the challenges of the enormous dynamic range of 
proteins in mammalian cells, we extensively fractionated the 
samples using an SCX-based approach. This allowed us to 
separate peptides based on their charge state, which subse- 
quently were sequenced using targeted fragmentation 
schemes (i.e., CID, ETD, HCD) to enhance peptide identifica- 
tion (Frese et al, 2011). Using this approach, we achieved the 
identification of one the largest proteome coverage in 
pluripotent cells and somatic cells, spanning six orders of 
magnitude in protein abundance (Figure 2, Supplementary 
Tables S2 and S3) . The 50 most abundant proteins consisted of 
cytoskeleton (e.g., ACT1, ACTBL2, TUBB2A), chaperones 
(e.g., HSPA90AB1, HSPA8, HSPA2), ribosomal (e.g., RPS27A, 
HNRPC) and histones (e.g., HIST1H4A, HIST1H2AB), the 
latter group likely reflecting the high nucleus/cytoplasm ratio 
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of pluripotent stem cells (Thomson et al, 1998). On the other 
hand, among the less abundant, we found numerous 
transcription factors (e.g., DMTF1, BTBD1, ZNF316), signaling 
molecules and regulatory proteins (e.g., EFNB2, SOCS7, 
STK35). Most importantly, proteins known to be associated 
with pluripotency and self-renewal such as SOX2, NANOG, 
OCT4, LIN28 and SALL4 were found to have expression levels 
in the middle range (i.e., ~ 1000 times less abundant than the 
structural components), which confirms the importance of 
their functions in these cell. Given the fact that mRNA levels 
poorly predict protein translation rates (Schwanhausser et al, 
2011), our data set may be highly valuable for those 
applications such as FACS and RNAi screenings, in which 
knowing the absolute levels of proteins could determine the 
success of the experiment (van der Flier et al, 2009). 

The use of cost-effective dimethyl isotopes in our workflow 
allowed us to accurately quantify relative protein changes 
between hESCs, hiPSCs and their parental fibroblast cell lines. 
Furthermore, the overall good correlation with the mRNA 
levels (obtained from paired microarray analyses) validated 
the reliability of our proteomic measurements. The compar- 
ison of two different hiPS cell lines with the hESCs confirmed, 
at the protein level, that the reprogramming process success- 
fully activated the expression of pluripotency genes and 
repressed those related to terminally differentiated fibroblasts. 
The proteomes of hiPSCs and hESCs were found to be very 
similar, where 97.8% of the proteins displayed nonsignificant 
changes. Nevertheless, a small subset of proteins (58) was 
found differentially expressed in common in the two experi- 
ments conducted (Figure 3B and Supplementary Table S4), 
among them are several components of the immune system. 
Our results showed that iPSCs have reduced levels (less than 
3 -fold) of two proteins that are essential for the cell-surface 
expression of HLA class I and correct antigen presentation: 
B2M and tapasin (TAPBP). In agreement with our results, 
it has been shown that the reprogramming process might 
downregulate, through epigenetic mechanisms, MHC and 
processing molecules (Suarez-Alvarez et al, 2010). Further 
experimentation will be necessary to find out if these findings 
may impact the immunogenicity of hiPSCs, but, interestingly, a 
recent report has described immune rejection on autologous 
transplanted murine iPSCs (Zhao et al, 2011). 

Several evidences point out that epigenetic mechanisms 
underlie some of the differences found in transcriptomic 
studies between hiPSCs and hESCs, reviewed in Hanna et al 
(2010). Genome-wide maps of nucleosomes, i.e., activating 
K4me3 and repressive K27me3 marks, revealed that both 
pluripotent cell lines are markedly similar (Guenther et al, 
2010). However, modifications in the histone tails are 
reversible changes that cause local formation of heterochro- 
matin, whereas DNA methylation leads to long-term repres- 
sion (Berger, 2007). Analyses of the 'DNA methylome' at 
different base pair resolutions have shown manifest differ- 
ences between hiPSCs and hESCs, pointing out that the 
reprogramming process could fail in repressing certain genes 
from the donor cells (epigenetic memory) . Moreover, aberrant 
methylation patterns acquired during reprogramming (epige- 
netic mutation) have been described (Lister etal, 2011) . Hence, 
we checked, in the parental cell lines, the levels of the 12 
proteins enriched in hiPSCs. Only one protein, the transferrin 
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receptor 1 showed increased levels in both IMR90 and 4Skin 
fibroblasts, which thereby may be explained by incomplete 
repression of somatic genes during reprogramming (CDH2 and 
COL4A1 were also found highly expressed in the IMR90 
fibroblasts, but not in the 4Skin cell line). Nevertheless, the 
remaining proteins showed a lower expression in the 
fibroblasts when compared with the reprogrammed hiPSCs. 

On the basis of the transcriptomic profiling, several groups 
have reported that hiPSCs can be distinguished from hESCs by 
the presence of a recurrent molecular signature (Chin et al, 
2009; Marchetto et al, 2009; Ghosh et al, 2010). A more recent 
study that includes a significant higher number of cell lines 
indicated that gene expression programs in hESCs and hiPSCs 
partially overlapped, although there was a significant differ- 
ence on average hES and hiPS cell lines (Bock et al, 2011). 
Owing to the relatively low throughput of MS-based proteo- 
mics when compared with other '-omics', our study was 
limited to two different hiPS cell lines, i.e., IMR90_iPS and 
4Skin_iPS. Though, we found 12 proteins consistently 
enriched in hiPSCs and 46 proteins in hESCs (Figure 3 and 
Supplementary Table S4) . This may be explained by the similar 
nature of our hiPS cell lines: they are both retroviral 
reprogrammed (Yu et al, 2007), have mesoderm origin and 
were cultured at late passage, which altogether could 
contribute to reduce noise from the analysis (Bock et al, 
2011). To further investigate these findings, we compared our 
list of differential proteins with the transcript lists derived from 
several independent analyses (Yu et al, 2007; Maherali et al, 
2008; Chin et al, 2009; Guenther et al, 2010). These studies 
include a broad spectrum of different hiPSCs: alternative 
reprogramming methods, different somatic origins and low 
passage cultures. The comparisons showed some genes in 
common with the differential transcripts published by Yu et al 
(2007): 12/58 (e.g., CDH2, TFRC, RRM2 ACAT1, CAPG, 
SDR39U1) and by Maherali et al (2008): 10/58 (e.g., DCXR, 
RCN3, CDH2, SLC38A2, ACOX1, HSPA2). Nonetheless, the 
overlap was found not significant (Fisher test, P> 0.05) and we 
did not find any regulated gene common to all the studies. 
Consequently, our results are in line with the emerging idea 
that differences between hiPSCs and hESCs rather reflect 
experimental conditions than a consistent molecular signa- 
ture. As expected, the comparison of both hiPS cell lines with 
their somatic donor fibroblast cells showed massive differ- 
ences in their proteomes. Bioinformatics analyses on the 
differentially expressed proteins between these cell lines 
disclosed functionally interconnected protein networks in 
the hiPSCs and fibroblasts (Figure 3). The protein network in 
hiPSCs is especially relevant, as it may constitute the protein 
core regulating pluripotency. Interestingly, within this network 
we found many proteins known to interact with SOX2, 
NANOG and OCT4 in hESCs (Wang et al, 2006; Mallanna 
et al, 2010; Pardo et al, 2010; van den Berg et al, 2010) such as 
Requiem, PRC1, Wdr3b and P66b and NAC1. Furthermore, 
numerous proteins present in this network are also target 
genes of SOX2, NANOG and OCT4, i.e., the molecular circuitry 
governing pluripotency (Boyer et al, 2005), including SMAR- 
CAD1, RIF1, ARID IB, DPPA4 and TLE3. 

This study constitutes an invaluable resource for the 
stem cell community by adding an essential layer, the 
protein content, to the systems biology view of pluripotency, 
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highlighting the molecular similarities between these pluripo- 
tent cell lines. Therefore, it is our hope that our data will serve 
as a platform for future investigations, which more targeted 
experimentation might reveal. 

Materials and methods 
Culture of hiPSCs 

Induced pluripotent stem cell lines IMR90_iPS and 4Skin_iPS were 
cultured in Matrigel (Becton Dickinson) -coated dishes, supplemented 
with mTeSRl media (Stem Cells Technologies) and passaged every 
7 days. Briefly, the cells were washed once with phosphate-buffered 
saline (PBS; Gibco) before enzymatic treatment with dipase (Chemi- 
con) for ~3min. After neutralization with media, the cells were 
triturated into small clumps or single cells and seeded onto new 
Matrigel-coated at a split ratio of 1 :3 to 1 :8. The cells were incubated at 
37°C in 5% C0 2 incubator. 



Characterization of pluripotency 

Flow cytometry analysis (FACS) 

The expression levels of the pluripotent markers Oct-4, podocalyxin 
and Tra-1-60 in iPSC populations were assessed by immunofluores- 
cence using flow cytometry. Cells were harvested as a single-cell 
suspensions using 0.25% trypsin-EDTA (Gibco), fixed and permeabi- 
lized (Caltag Laboratories) before incubation with a mouse mono- 
clonal antibody to Oct-4 (1:20, Santa Cruz), podocalyxin (mAb 84, 
5 |ig, in-house) and Tra-1-60 (1:50, Chemicon) . Cells were then washed 
with 1% BSA/PBS, and incubated in the dark with goat a-mouse 
antibody FITC-conjugated (DAKO) at 1:500 dilution. After incubation, 
the cells were washed and resuspended in 1 % BSA/PBS for analysis on 
a FACScan (Becton Dickinson FACS Calibur). All incubations were 
performed at room temperature for 15 min. For the negative control, 
cells were stained with the appropriate isotype control. 



Staining of hESC for markers 

Staining of iPSC was carried out by incubating the cells with fixative 
Reagent A (Caltag Laboratories) for 1 h, before blocking with 3 % BSA/ 
PBS for another hour. After washing with 0.1% Triton/PBS, the cells 
were incubated with antibodies to Oct-4 (Santa Cruz), SSEA-4 (DHSB) 
and Tra-1-60 (Chemicon) for 1 h. The detection of bound antibodies to 
the pluripotent markers was visualized using DAKO goat a-mouse 
antibody conjugated with PE (diluted 1:500). 



Karyotypic stability 

Karyotyping analysis was performed by the Cytogenetics Laboratories 
at the Department of Obstetrics and Gynaecology, KK Women's and 
Children's Hospital. Cell samples were incubated with BrdU/colcemid 
(reagent from hospital) for 16 h in 37°C, 5% C0 2 incubator. 

In-vivo differentiation assay, SCID mice model and 
teratoma analysis 

Induced PSCs were harvested by collagenase (Sigma) treatment and 
approximately 4-5 x 10 6 cells were injected with a sterile 22G needle 
into the rear leg muscle of 4-week-old female SCID mice. Mice that 
developed tumors approximately 9-10 weeks after injection were 
killed and the tumors were dissected and fixed in 10% formalin. 
Tumors were embedded in paraffin, sectioned and examined 
histologically after hematoxylin and eosin staining. 



Sample preparation for MS 

Cells (i.e., IMR90JPS, 4Skin_iPS, hESCs and IMR90 and 4Skin 
fibroblasts) were harvested by centrifugation at 2500g for 10 min at 
4°C. Cell lysis was performed in a buffer containing 8M urea, 2M 
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thiourea in a solution of 25 mM ammonium bicarbonate, pH 8.2, with 
protease and phosphatase inhibitors (Roche) . Proteins ( ~ 1 mg) were 
first reduced/alkylated and digested for 4h with Lys-C. The mixture 
was then diluted 4-fold to 2M urea and digested overnight with 
trypsin. Digestion was quenched by acidification with formic acid 
(final concentration 2%). Resulting peptides were then chemically 
labeled with stable isotope dimethyl labeling as described previously 
(Boersema et al, 2009). Briefly, IMR90_iPS and 4Skin_iPS peptides 
were labeled with a mixture of formaldehyde-H2 and sodium 
cyanoboro hydride ('light' reagent). For hESCs and IMR90 and 4Skin 
fibroblast cells, formaldehyde-D2 with cyanoborohydride ('intermedi- 
ate' reagent) and 13 C-D2-formaldehyde with cyanoborodeuteride 
('heavy' reagent) were used respectively. In a second biological replica 
experiment, hESC and hiPSC reagents were swapped, whereas 
'heavy'-IMR90/4Skin fibroblast was kept constant. The 'light', 
'intermediate' and 'heavy' dimethyl-labeled samples were mixed in 
1:1:1 ratio based on total peptide amount, which was determined by 
running an aliquot of the labeled samples on a regular LC-MS/MS run 
and comparing overall peptide signal intensities. 

Before the mass spectrometic analysis, both replicates were 
fractionated using SCX systems. For Experiment 1, peptides were 
fractionated as described elsewhere (Helbig et al, 2010). The SCX 
system consisted of an Agilent 1100 HPLC system (Agilent Technolo- 
gies, Waldbronn, Germany) with two C 18 Opti-Lynx (Optimized 
Technologies, OR) trapping cartridges and a polysulfoethyl A SCX 
column (PolyLC, Columbia, MD; 200 mm x 2.1 mm inner diameter, 
5 um, 200-A). The labeled peptides were dissolved in 10% FA and loaded 
onto the trap columns at 100 (il/min and subsequently eluted onto the 
SCX column with 80% acetonitrile (ACN; Biosolve, The Netherlands) 
and 0.05% FA. SCX buffer A was made of 5mM KH 2 P0 4 (Merck, 
Germany), 30% ACN and 0.05% FA, pH2.7; SCX buffer B consisted of 
350 mM KC1 (Merck, Germany), 5mM KH 2 P0 4 , 30% ACN and 0.05% 
FA, pH 2.7. The gradient was performed as follows: 0% B for lOmin, 
0-85% B in 35 min, 85-100% Bin6min and 100% B for 4 min. A total 
of 45 fractions were collected for each set and dried in a vacuum 
centrifuge. The second SCX system (Pinkse et al, 2008) was performed 
using a Zorbax BioSCX-Series II column (0.8-mm inner diameter x 50- 
mm length, 3.5 um). SCX solvent A consists of 0.05% formic acid in 
20% ACN, while solvent B was 0.05% formic acid, 0.5 M NaCl in 20% 
ACN. The SCX salt gradient is as follows: 0-0.01 min (0-2% B); 0.01- 
8.01 min (2-3% B); 8.01-14.01 min (3-8% B); 14.01-28 min (8-20% 
B); 28-38 min (20-40% B); 38-48 min (40-90% B); 48-54 min (90% 
B); 54-60 min (0% B). A total of 50 SCX fractions (lmin each, i.e., 
50-ul elution volume) were collected and dried in a vacuum centrifuge. 
Only the second SCX system was used to fractionate lysates from 
Experiment 2. 



Mass spectrometric analysis 

For Experiment 1, we performed nanoflow LC-MS/MS with an LTQ- 
Orbitrap XL ETD mass spectrometer (Thermo Electron, Bremen, 
Germany) coupled to an Agilent 1200 HPLC system (Agilent 
Technologies). SCX fractions were dried, reconstituted in 10% FA 
and delivered to a trap column (Aqua™ Ci 8 , 5um (Phenomenex, 
Torrance, CA); 20 mm x 100-um inner diameter, packed in-house) at 
5 (il/min in 100 % solvent A (0.1 M acetic acid in water) . Next, peptides 
eluted from the trap column onto an analytical column (ReproSil-Pur 
C 18 -AQ, 3 um (Dr Maisch GmbH, Ammerbuch, Germany); 40 cm x 50- 
um inner diameter, packed in-house) at approximately 100 nl/min in a 
90 min or 3 h gradient from 0 to 40% solvent B (0.1 M acetic acid in 8:2 
(v/v) ACN/ water) . The eluent was sprayed via distal coated emitter 
tips butt-connected to the analytical column. The mass spectrometer 
was operated in data-dependent mode, automatically switching 
between MS and MS/MS. Full-scan MS spectra (from m/z 300 to 
1500) were acquired in the Orbitrap with a resolution of 60 000 at m/z 
400 after accumulation to target value of 500 000 in the linear ion trap. 
For SCX fractions dominated by singly charged and doubly charged 
peptides, the five most intense ions at a threshold above 5000 were 
selected for collision-induced fragmentation in the linear ion trap at a 
normalized collision energy of 35% after accumulation to a target 
value of 10 000. For highly charged SCX fractions, the five most intense 
ions at a threshold of above 500 were fragmented in the linear ion trap 



using electron-transfer dissociation with supplemental activation 
(ETcaD) at a target value of 50 000. The ETcaD reagent target value 
was set to 100 000 and the reaction time to 50 ms. 

MS analysis for Experiment 2 was performed with the same LC 
gradient and configuration but using an LTQ-Orbitrap Velos (Thermo 
Electron) . MS data were acquired with a data-dependent decision tree 
method as described (Frese et al, 2011). Briefly, following the survey 
scan (30 000 FHMW), the 10 most intense precursor ions were 
subjected to HCD, ETD-ITor ETD-FT fragmentation. The choice of the 
most appropriate technique for a selected precursor was determined by 
a preprogrammed data-dependent decision tree. Essentially, doubly 
charged peptides were subjected to HCD fragmentation and more 
highly charged peptides were fragmented using ETD. The normalized 
collision energy for HCD was set to 35%. ETD was enabled with 
supplemental activation and the reaction time was set to 50 ms for 
doubly charged precursors. 



Data processing 

MS data were processed and quantified with Proteome Discoverer 
(version 1.3, Thermo Electron) with standardized workflows. This 
ensures consistent and reproducible quantification for all samples, 
avoiding possible bias introduced by manual intervention. These 
workflows are made available as Supplementary Materials. For 
Experiment 1, peptide identification was performed with Mascot 2.3 
(Matrix Science) against a concatenated forward-decoy IPI Human 
database supplemented with all the frequently observed contaminants 
in MS (version 3.68, 174 650 entries). The following parameters were 
used: 50 p.p. m. precursor mass tolerance, 0.5 Da fragment ion 
tolerance, up to 2 missed cleavages, carbamidomethyl cysteine as 
fixed modification and oxidized methionine as variable modifications. 
Dimethyl-based quantitation method was chosen in Proteome 
Discoverer, with mass precision requirement of 2 p. p.m. for con- 
secutive precursor measurements. Besides, taking into account the 
isotopic effect of deuterium, we applied 1 min of retention time 
tolerance for isotope pattern multiplets and allowed spectra with 2 
missing channels to be quantified. After identification and quantifica- 
tion, we combined all results originating from the same biological 
replica and filtered them according to very stringent peptide 
acceptance criteria. These criteria include (i) mass deviations of 
± 5p.p.m., (ii) Mascot Ion Score of at least 20, (iii) a minimum of 7 
amino-acid residues per peptide and (iv) position rank 1 in Mascot 
search. As a result, we obtained peptide FDRs of 0.23 and 0.19% for 
two respective biological replicas (Supplementary Table SI). Finally, 
peptide ratios were then normalized against the median (log 2 ) . The 
same criteria were subsequently applied to analyze Experiment 2 
except for the following: 0.5 Da fragment ion tolerance for ETD-IT, and 
0.02 Da fragment ion tolerance for both HCD and ETD-FT. Peptide 
FDRs obtained for Experiment 2 were 0.56 and 0.59%, respectively. 

The MS data associated with this manuscript can be downloaded 
from ProteomeCommons.org under the following Tranche hash: 
d/Ci8K23/EI0YIsZ7OQpxQjkHUlCdAMtiBKQT6a + 4McomzkpsxZYLn 
ZYcmlvjmMhHv94w5or5jGg/6l42VgFGtFZSC0AAAAAAAAOtw==. 



Microarray analysis 

RNA isolation 

Cells were washed twice in PBS (Gibco), harvested and quantified 
using the NucleoCounter (Chemometec) . RNA was isolated using 
TRIzol (Invitrogen) /chloroform according to manufacturer's protocol. 
RNA in the extracted aqueous phase was concentrated by precipitation 
with an equal volume of isopropanol at -20°C overnight, and further 
purified using RNeasy Mini Kit (Qiagen). Quantity of RNA was 
measured by NanoDrop (Thermo Scientific) and quality evaluated by 
capillary electrophoresis on QIAxcel (Qiagen) . 



Transcriptome analysis 

Affymetrix U133 Plus 2.0 GeneChip Microarrays (Human Genome) 
were used according to manufacturer's protocol. Briefly, GeneChip 
3' IVT Express Kit was used to generate labeled amplified RNA (aRNA) 



10 Molecular Systems Biology 2011 



© 2011 EMB0 and Macmillan Publishers Limited 



Quantitative proteomics of pluripotent stem cells 
J Munoz etal 



from 500 ng of total RNA, with 12.5 jig of fragmented aRNA 
subsequently hybridized onto each microarray. Intact aRNA yield 
was measured by NanoDrop, and quality of intact and fragmented 
aRNA evaluated on 2 % agarose gel. Hybridizations were carried out at 
45°C for 16 h with reagents from GeneChip Hybridization, Wash, and 
Stain Kit, with subsequent washing and staining of arrays automated 
and preprogrammed (FS450_0001) on Fluidics Station 450. Arrays 
were scanned using GeneChip Scanner 3000. 



Microarray data analysis 

Affymetrix Expression Console software was used to analyze scanned 
image files. Array data were quantified and normalized using MAS5.0. 
algorithm, with corresponding detection calls for each probeset 
determined. Only data from probesets that (i) were flagged as present 
in at least one set of biological replicates, and (ii) possessed the highest 
summed intensities among redundant probesets— those annotated 
with similar gene name or unigene identifier, were used for subsequent 
comparison with the proteome study. Microarray data are available at 
the NCBI Gene Expression Omnibus database under the accession 
numbers GSE26451 and GSE26453. 



Bioinformatic analysis 

Protein classification (molecular function, biological process, cellular 
component and protein class) was performed using the PANTHER 
classification system (Mi et al, 2007) . GO enrichment was performed 
using a Binomial test as described elsewhere (Cho and Campbell, 
2000), the entire list of identified proteins was used as the reference 
data set. Protein and mRNA levels were combined using official gene 
symbols (HUGO). Protein networks were created with String (Snel 
et al, 2000). 



Statistical analysis 

Significance analysis of microarrays (Tusher et al, 2001; Roxas and Li, 
2008) was used to identify significantly regulated proteins between 
hESCs, hiPSCs and fibroblasts. Only those proteins that were 
quantified in both experiments (i.e., IMR90 and 4Skin) and in both 
biological replicas were analyzed under statistical criteria. Overall, 
log 2 -transformed ratios for 2683 proteins were processed by using the 
MultiExperiment Viewer software (version 4.7.4) (Saeed et al, 2006). 
Briefly, one-class test was used (the mean value to test again was 0), 
1000 permutations were chosen for a better approximation of FDR 
values (Saeed et al, 2006), SO value was selected according to the 
method proposed by Tusher et al (2001) and ^-values were calculated 
for each protein. Delta values were manually adjusted to ~ 1 % FDR. 
SAM graphs for both comparisons hESCs/hiPSCs and Fibroblasts/ 
hiPSCs are shown in Supplementary Figure S10. The results from the 
analyses are presented in Supplementary Table S4. 



Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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